Learning Multi-modal Dictionaries: Application to Audiovisual Data

نویسندگان

  • Gianluca Monaci
  • Philippe Jost
  • Pierre Vandergheynst
  • Boris Mailhé
  • Sylvain Lesage
  • Rémi Gribonval
چکیده

This paper presents a methodology for extracting meaningful synchronous structures from multi-modal signals. Simultaneous processing of multi-modal data can reveal information that is unavailable when handling the sources separately. However, in natural high-dimensional data, the statistical dependencies between modalities are, most of the time, not obvious. Learning fundamental multi-modal patterns is an alternative to classical statistical methods. Typically, recurrent patterns are shift invariant, thus the learning should try to find the best matching filters. We present a new algorithm for iteratively learning multimodal generating functions that can be shifted at all positions in the signal. The proposed algorithm is applied to audiovisual sequences and it demonstrates to be able to discover underlying structures in the data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of Synchronous Audiovisual Events

This paper presents an algorithm to correlate audio and visual data generated by the same physical phenomenon. According to psychophysical experiments, temporal synchrony strongly contributes to integrate cross-modal information in humans. Thus, we define meaningful audiovisual structures as temporally proximal audio-video events. Audio and video signals are represented as sparse decompositions...

متن کامل

Uncorrelated Multi-View Discrimination Dictionary Learning for Recognition

Dictionary learning (DL) has now become an important feature learning technique that owns state-of-the-art recognition performance. Due to sparse characteristic of data in real-world applications, DL uses a set of learned dictionary bases to represent the linear decomposition of a data point. Fisher discrimination DL (FDDL) is a representative supervised DL method, which constructs a structured...

متن کامل

Audiovisual quality integration for interactive communications

This paper investigates multi-modal aspects of audiovisual quality assessment for interactive communication services. It shows how perceived auditory and visual qualities integrate to an overall audiovisual quality perception in different experimental contexts. Two audiovisual experiments are presented and provide experimental data for the present analysis. First, two experimental contexts are ...

متن کامل

Online Multi-Modal Robust Non-Negative Dictionary Learning for Visual Tracking

Dictionary learning is a method of acquiring a collection of atoms for subsequent signal representation. Due to its excellent representation ability, dictionary learning has been widely applied in multimedia and computer vision. However, conventional dictionary learning algorithms fail to deal with multi-modal datasets. In this paper, we propose an online multi-modal robust non-negative diction...

متن کامل

Assessment of learning style based on VARK model among the students of Qom University of Medical Sciences

Introduction: Learning is a dominant phenomenon in human life. Learners are different from each other in terms of attitudes and cognitive styles which effect on the learning of people. In this connection, VARK learning style assess the students base their individual abilities and method for obtaining much information from environment in dimensions of visual, aural, read/write, and kinesthetic. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006